Benefits of Topology Aware Mapping for Mesh Interconnects
نویسندگان
چکیده
The fastest supercomputers today such as Blue Gene/L, Blue Gene/P, Cray XT3 and XT4 are connected by a three-dimensional torus/mesh interconnect. Applications running on these machines can benefit from topology-awareness while mapping tasks to processors at runtime. By co-locating communicating tasks on nearby processors, the distance traveled by messages and hence the communication traffic can be minimized, thereby reducing communication latency and contention on the network. This paper describes preliminary work utilizing this technique and performance improvements resulting from it in the context of a n-dimensional k-point stencil program. It shows that even for simple benchmarks, topology-aware mapping can have a significant impact on performance. Automated topology-aware mapping by the runtime using similar ideas can relieve the application writer from this burden and result in better performance. Preliminary work towards achieving this for a molecular dynamics application, NAMD, is also presented. Results on up to 32, 768 processors of IBM’s Blue Gene/L, 4, 096 processors of IBM’s Blue Gene/P and 2, 048 processors of Cray’s XT3 support the ideas discussed in the paper.
منابع مشابه
A Case Study of Communication Optimizations on 3D Mesh Interconnects
Optimal network performance is critical to efficient parallel scaling for communication-bound applications on large machines. With wormhole routing, no-load latencies do not increase significantly with number of hops traveled. Yet, we, and others have recently shown that in presence of contention, message latencies can grow substantially large. Hence task mapping strategies should take the topo...
متن کاملQuantifying Network Contention on Large Parallel Machines
In the early years of parallel computing research, significant theoretical studies were done on interconnect topologies and topology aware mapping for parallel computers. With the deployment of virtual cut-through, wormhole routing and faster interconnects, message latencies reduced and research in the area died down. This article shows that network topology has become important again with the ...
متن کاملPartially Reconfigurable Point-to-Point Interconnects in Virtex-II Pro FPGAs
Conventional rigid router-based networks on chip incur certain overheads due to huge occupied logic resources and topology embedding, i.e., the mapping of a logical network topology to a physical one. In this paper, we present an implementation of partially reconfigurable point-to-point (ρ-P2P) interconnects in FPGA to overcome the mentioned overheads. In the presented implementation, arbitrary...
متن کاملCost-aware Topology Customization of Mesh-based Networks-on-Chip
Nowadays, the growing demand for supporting multiple applications causes to use multiple IPs onto the chip. In fact, finding truly scalable communication architecture will be a critical concern. To this end, the Networks-on-Chip (NoC) paradigm has emerged as a promising solution to on-chip communication challenges within the silicon-based electronics. Many of today’s NoC architectures are based...
متن کاملOptimizing communication for Charm++ applications by reducing network contention
Optimal network performance is critical for efficient parallel scaling of communicationbound applications on large machines. No-load latencies do not increase significantly with number of hops traveled when wormhole routing is deployed. Yet, we and others have recently shown that in presence of contention, message latencies can grow substantially large. Hence task mapping strategies should take...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Parallel Processing Letters
دوره 18 شماره
صفحات -
تاریخ انتشار 2008